Semi-Supervised Never-Ending Learning in Rhetorical Relation Identification

نویسندگان

  • Erick Galani Maziero
  • Graeme Hirst
  • Thiago Alexandre Salgueiro Pardo
چکیده

Some languages do not have enough labeled data to obtain good discourse parsing, specially in the relation identification step, and the additional use of unlabeled data is a plausible solution. A workflow is presented that uses a semi-supervised learning approach. Instead of only a predefined additional set of unlabeled data, texts obtained from the web are continuously added. This obtains near human perfomance (0.79) in intra sentential rhetorical relation identification. An experiment for English also shows improvement using a similar workflow.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Never Ending Language Learning

We report research toward a never-ending language learning system, focusing on a first implementation which learns to classify occurrences of noun phrases according to lexical categories such as “city” and “university.” Our experiments suggest that the accuracy of classifiers produced by semi-supervised learning can be improved by coupling the learning of multiple classes based on background kn...

متن کامل

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Our inspiration comes from Nell (Never Ending Language Learning), a computer program running at Carnegie Mellon University to extract structured information from unstructured web pages. We consider the problem of semi-supervised learning approach to extract category instances (e.g. country(USA), city(New York)) from web pages, starting with a handful of labeled training examples of each categor...

متن کامل

Learning Connective-based Word Representations for Implicit Discourse Relation Identification

We introduce a simple semi-supervised approach to improve implicit discourse relation identification. This approach harnesses large amounts of automatically extracted discourse connectives along with their arguments to construct new distributional word representations. Specifically, we represent words in the space of discourse connectives as a way to directly encode their rhetorical function. E...

متن کامل

A Granular - based Approach for Semisupervised Web Information Labeling

A key issue when mining web information is the labeling problem: data is abundant on the web but is unlabelled. In this thesis, we address this problem by proposing i) a novel theoretical granular model that structures categorical noun phrase instances as well as semantically related noun phrase pairs from a given corpus representing unstructured web pages with a variant of Tolerance Rough Sets...

متن کامل

Bootstrapping Biomedical Ontologies for Scientific Text using NELL

We describe an open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner) (Carlson et al., 2010), a system designed for extraction from Web text. NELL uses a coupled semi-supervised bootstrapping approach to learn new facts from text, given an initial ontology and a small number of “seeds” for each ontology category. In contrast to previous applicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015